Overview

The goal of this study is to retrospectively determine the factors that influenced the spatiotemporal spread of COVID-19 throughout the United States during the first wave of the pandemic. Specifically, we aim to explain the role of county-level attributes and county-county mobility patterns on the spread of COVID-19. Additionally, the model can aid in predicting future spatial spread in the United States in the event of regional containment.

Our approach involves fitting a stochastic model that predicts the rate of COVID-19 importation into new counties in the United States. The model is updated daily from March 1, 2020 to August 3, 2020. The time of infection for each county is based on COVID-19 case data reported at the county level by the New York Times that are based on reports from state and local health agencies. The probabilities of COVID-19 importation to all potential receiving counties, from all potential transmitting counties (sources), at each time step are defined by a generalized gravity model.

County level attributes that we considered include:

  • population size
  • COVID-19 cases
  • non-pharmaceutical interventions in place (mandates, stay at home orders, gathering size limits, and bar closures).

County-county mobility patterns and connections that we considered include:

  • residence-workplace commuting flows information from the American Community Survey (ACS)
  • estimated daily domestic flight passenger volume based on OAG data from 2019 and from the pandemic period (March 2020-July 2020)
  • the Facebook Social Connectedness Index (SCI).

The results in this summary are from the best fit model which estimates the probability of COVID-19 importation based on:

  • county population sizes
  • distances between counties
  • total COVID-19 cases reported in the previous 10 days in the source county
  • the estimated number of commuters between counties based on the ACS
  • the estimated number of daily flight passengers traveling between counties from March 2020-July 2020
  • four non-pharmaceutical interventions in place in source counties:
    • bar closures
    • stay-at-home orders
    • mask requirements
    • gathering restrictions

Map: Outbreak probability by county

The following map shows the model-predicted probability of reporting the first case in the next period. Probabilities change over time as underlying conditions, such as the number of cases in neighboring counties, change. Use the slider to show probabilities for a different day. Counties turn gray once they report their first case.

Map: County-county commuting flows

According to the best fit model, the number of commuters between counties is positively associated with higher COVID-19 transmission. The commuting flows are based on estimates from the 2011-2015 ACS commuting survey from the US Census. These connections are predominantly short-distance commutes between cities and their surrounding suburbs, but also notably contain long-distance commuting flows.

Fig. 3: These lines connect counties that account for the top 0.1% (n = 4944) of the strongest pairwise county-county commuting flows

Map: County-county domestic flight passenger flows

The volume of county-county domestic flight passengers is also associated positively associated with higher COVID-19 transmission. Data on domestic flight passenger volume are from the Official Airline Guide (OAG), which is available as monthly passenger totals for each flight path. Passengers were allocated to counties in catchment areas surrounding airports, with proportion of passengers allocated to counties based on the county’s population and distance to an airport, with a lower proportion of passengers allotted to counties as the radius increased from the airports and as the population decreased.

We originally fit the model to a static data set of the mean of 2019 flight passenger volume, which was an improvement when compared to a model without any flight data. However, a model that was fit with a time-varying data set of flight volume from the period of the pandemic (March 2020-July 2020) outperformed the model with the 2019 data only. This provided evidence that the relative county-county flight passenger flows varied throughout the pandemic months, and have had unequal changes in their passenger volume compared to baseline. This is evident by the data in Fig. 4, with some paths returning to close to the 2019 baseline quickly whereas others remain far below baseline even in July (e.g. flight volume to Hawaii counties, New York City counties, etc.). However, since the data on passenger volume is not available in real-time, we suggest that if the model is to be used for predicting COVID-19 transmission risk in the future, using historic flight volume averages is still beneficial.

Fig. 4: The lines connect counties that account for the top 0.01% strongest pairwise connections from 2019 (n = 494), and are colored by the each month’s volume as a % of the 2019 baseline.

Model details

The results in this summary are from the best fit model which estimates the probability of COVID-19 importation based on county population sizes (\(mass_{ij}=pop_i*pop_j\)), distances (\(dist_{ij}\)) between counties, total COVID-19 cases reported in the previous 10 days in the source county \(i\) (\(cases_{ij}\)), the estimated number of commuters between counties \(i\) and \(j\) based on the American Community Survey (\(acs_{ij}\)), the estimated number of daily flight passengers traveling between counties \(i\) and \(j\) (\(flight_{ij}\)) from March 2020-July 2020 based on Official Airline Guide (OAG) data, and four non-pharmaceutical interventions in place in counties \(i\) (\(bars_i\), \(sah_i\), \(mask_i\), \(gather_i\)).1

\[\begin{equation} p_{ij} = \frac{1}{1 + e^{\beta_0 + \frac{mass^{\beta_{2}}_{ij} cases^{\beta_3}_i}{dist^{\beta_1}_{ij}} + \beta_4\log(acs_{ij} + 1) + \beta_5\log(flights_{ij}) + \beta_6bars_i + \beta_7sah_i + \beta_8mask_i + \beta_9gather_i}} \;\; (Eq. 1) \end{equation}\]

Models are fit using maximum likelihood estimation and the best model is selected using AIC.

Parameter estimates

Table 1 contains the parameter estimates for the model specified by Eq. (1), which estimates that infection probability increases with population size and decreases with distance between counties. Higher numbers of COVID-19 cases in the source is also associated with higher infection probability. Counties with higher commuting and domestic flight passenger flows between them also have higher risk of COVID-19 transmission. All four interventions for which we have data are associated with a lower probability of COVID-19 spread from infected counties to uninfected counties.

Table 1. Parameter estimates for best-fit model
model const dist mass cases acs flights bars sah mask gather
Model1 8.03 -1.77 -1 -0.7 -0.47 -0.16 -1.15 0.94 1.2 0.09

Comparing Interventions

We also use the fitted model to compare the effectiveness of different interventions - bar closures, stay-at-home orders, mask requirements, and gathering size limits - in reducing the probability of spread of SARS-CoV-2 from transmitting counties to receiving counties. Estimated effect sizes are depicted in Figure 1. Three of the four interventions are quite effective at reducing spatial spread. Bar closures appear to be less effective, but this is likely because many counties close bars around the same time they have a first case. When we focus attention on bar closures that are put in place before the first case (not depicted), the effect size is as high as with the other interventions. These fitted effect sizes show that there are multiple dimensions of precaution that are important. There is not a single axis of precautious behavior that these interventions serve as a proxy for; each government action appears to be independently important.

Figure 1: Estimated effect sizes from four interventions for which we have data. Error bars show asymmetric likelihood profile standard errors. Larger values indicate that an intervention decreases the probability of spread in our model.

Limitations

The model is a closed system with US counties as the only potential sources of transmission, that is, our model cannot consider international importations. To account for this, we use data recorded after March 1, 2020. New cases after this date are thought to be predominately a result of widespread local transmission.2 Furthermore testing criteria was expanded on March 4 to include individuals without international travel history3. Testing availability was still limited for some time, so the infection times we fit are likely to be biased to later than the true infection times. We can be confident in our model estimates insofar as case ascertainment did not vary systematically across geographies.


  1. In another model (not shown here) we included the Facebook Social Connectedness Index but it did not improve our estimates.↩︎

  2. Davis JT, Chinazzi M, Perra N, et al. Estimating the establishment of local transmission and the cryptic phase of the COVID-19 pandemic in the USA. Preprint. medRxiv. 2020;2020.07.06.20140285. Published 2020 Jul 7. doi:10.1101/2020.07.06.20140285↩︎

  3. CDC, “Updated Guidance on Evaluating and Testing Persons for Coronavirus Disease 2019 (COVID-19)”; https://emergency.cdc.gov/han/2020/han00429.asp.↩︎